Entry Name:  "Pranab_Banerjee-MC2"

VAST Challenge 2017
Mini-Challenge 2

 

 

Team Members:

Pranab Banerjee, Boston Fusion Corp., pranab.banerjee@bostonfusion.com



Student Team: No

 

Tools Used:

 

Approximately how many hours were spent working on this submission in total?

Approximately 35 hours.

 

May we post your submission in the Visual Analytics Benchmark Repository after VAST Challenge 2017 is complete?
YES

 

Video

https://drive.google.com/file/d/0Bx4gFykeYQnHbzRLUV9ZR0xac1E/view?usp=sharing

 

 

 

Questions

MC2.1 – Characterize the sensors’ performance and operation.  Are they all working properly at all times?  Can you detect any unexpected behaviors of the sensors through analyzing the readings they capture? Limit your response to no more than 9 images and 1000 words.


The provided data shows that no reading from any of the sensors were registered between 11PM on 4/30/2016 and 0:00 hour on 8/1/2016. This could be an indicator of sensor network malfunction during that period.


This period of disruption in sensor reading was discovered by looking at the "normal" statistical pattern of time interval between successive readings for all the sensors. To compute the intervals, the time stamps in the provided data file "Sensor Data.xlsx" were converted from the specified date and time format to unix epoch format (number of seconds since 00:00:00 UTC 1st January 1970). This made it easier to compute the sensor reading intervals in seconds. The histogram of time intervals for all the sensors were essentially identical and showed the  pattern shown in Figure 1 below:


Interval histogram
Figure 1: Histogram of time intervals between successive sensor measurements for sensor 1. Bin frequency axis has been truncated to a upper limit of 20.



Note that the y-axis in Figure 1 has been truncated to a bin-frequency count of 20 to emphasize the fact that there is a single outlier to the  far right of the histogram which is a bin with a single interval value in it. The non-truncated histogram had y-axis range of 0 - 2199 corresponding to 2199 interval values in the first bin. The median of the intervals in this bin was 3600 seconds, which indicates that most of time, the sensor readings were taken every 3600 seconds. The outlier time interval to the extreme right of the histogram above was found to be 7952000 seconds. A search for this interval in the original dataset immediately led to the starting point of this interval to be at 11PM on 4/30/201.


Next we looked at the relations between sensor reading and each of wind direction and wind speed. The goal is to determine the sources of the different chemicals by correlating wind direction corresponding to high sensor readings with directions between [sensor, factory] pairs. To this effect, first we computed the following list of directions between the sensors and factories using the same angle measurement scheme as the wind directions specified in the provided data (North being 360/0 degrees):





Table 1: List of directional angles for pairwise conbinations of factory to sensor


  
 sensor  factory        angle
     1      Roadrunner       77.4711922908485

     1          Kasios                     90
     1        Radiance       87.9098408462893
     1          Indigo       89.0122396003602
                              
     2      Roadrunner       109.179008025811
     2          Kasios       120.256437163529
     2        Radiance       93.8712562319856
     2          Indigo       103.535856369134
                              
     3      Roadrunner       137.121096396661
     3          Kasios       145.007979801441
     3        Radiance       96.9529574681739
     3          Indigo       113.355564859286
                              
     4      Roadrunner       176.820169880136
     4          Kasios       175.236358309274
     4        Radiance       99.7132510249403
     4          Indigo       125.706691400603
                              
     5      Roadrunner        221.18592516571
     5          Kasios       210.579226872489
     5        Radiance        100.04202363553
     5          Indigo       141.009005957495
                              
     6      Roadrunner       248.962488974578
     6          Kasios       265.236358309274
     6        Radiance       87.6386253418244
     6          Indigo                     90
                              
     7      Roadrunner                      0
     7          Kasios       3.17983011986423
     7        Radiance       78.1901170429717
     7          Indigo       58.4957332807958
                              
     8      Roadrunner        36.869897645844
     8          Kasios       48.8140748342903
     8        Radiance       81.3571974199955
     8          Indigo       71.9395280638008
                              
     9      Roadrunner       243.434948822922
     9         Kasios       234.090276920822
     9        Radiance        101.30993247402
     9          Indigo       177.137594773888




Apart from the interval histograms shown in Figure 1 above, anomalies in the sensor readings were also observed in the sensors for a set of the chemicals. Two of the visualizations that clearly showed these anomalies are (i) sensor reading vs. wind direction plot, and (ii) sensor reading vs. wind speed plot. These plots were created for each [sensor, chemical] combination. For lack of space, we only show two such sets of plots in  Figure 2 and 3 below for sensors 2 and 6. The four plots on the left column in these figures show the relation between sensor reading and wind direction for all four chemicals, and the plots on the right column show the relation between sensor reading and wind speed.


Figure 2 shows plots for sensor #2. This figure clearly shows the anomalies in the sensor readings. For example, the two plots in the top row show three anomalous reading corresponding to the three isolated spikes corresponding to wind directions of 109.13, 130.20 and 227.34 degrees, and wind speeds of 0.76, 1.10 and 2.56 m/s respectively. The timestamps for these spikes are  "2016-08-20 05:00:00 UTC",   "2016-04-17 03:00:00 UTC", and  "2016-08-02 04:00:00 UTC" respectively.


The second row (for chemical Chlorodinine) shows one unusually high spike in reading for wind direction of 227.26 degrees, and time stamp of 2016-08-02 06:00:00 UTC.

Similarly, if we look at the third row in Figure 2, we see five unusually high spikes for the chemical "AGOC-3A" for wind directions of

 113.80, 155.20, 227.68, 228.02, and 228.05degrees, and corresponding time stamps of  2016-08-20 06:00:00 UTC,  2016-12-05 06:00:00 UTC,
2016-08-01 19:00:00 UTC, 2016-08-01 10:00:00 UTC, and 2016-08-01 09:00:00 UTC respectively. Note that the plots in Figure 2 appear to show only three spikes instead of five. This is because the x-axis is compressed here for space limitation, and some of the neighboring spikes overlap.

The plots in the very bottom row in Figure 2 show highly volatile reading for pretty much the whole range of wind direction and wind speed. This leads to the conlusion that sensor #2 is not good at directionally localizing the source of the chemical Appluimonia.

It is worth noting that a set of spikes with a Gaussian fall-off around the highest peak most likely indicate valid readings since we expect gradually decreasing amounts of chemicals reaching a sensor as the wind direction deviates from the direction of the source from the sensor. But a single isolated spike would most likely point to a sensor anomaly.




Fig 2

Figure 2: Plots of sensor reading vs. wind direction and wind speed for sensor #2




It is worth noting that a set of spikes with a Gaussian fall-off around the highest peak most likely indicate valid readings since we expect gradually decreasing amounts of chemicals reaching a sensor as the wind direction deviates from the direction of the source from the sensor. But a single isolated spike would most likely point to a sensor anomaly. As an example, consider the top three plots in the left column of Figure 3 below, which shows readings for the sensor #6. Here the peaks have approximately Gaussian fall off indicating these peaks most likely correspond to valid readings. However, the single isolated high spike in reading for the chemical Appluimonia for the wind direction of 227.18 degrees corresponding to a time stamp of 2016-08-02 08:00:00 UTC is suspect and could potentially be linked to sensor anomaly.



Fig 6

Figure 3:  Plots of sensor reading vs. wind direction and wind speed for sensor #6




A Dirichlet process based machine learning algorithm was developed to automatically determine such isolated spikes. Based on a combination of machine learning algorithm and visual analytics using plots like those in Figures 2 and 3, the following list of unexpected sensor readings were discovered:


Table 2: List of unexpected sensor readings
Sensor
Interpolated
Angle (deg)

Interpolated
Speed (m/s)

Date  Time
Chemical
1
88.90
0.6333333
2016-12-07 01:00:00 UTC
Methylosmolene
1
155.20
 0.8
2016-12-05 06:00:00 UTC
AGOC-3A
1
227.19
2.56
2016-08-02 08:00:00 UTC Appluimonia
1
 104.47
0.53
2016-08-20 04:00:00 UTC
Appluimonia
1
109.13
0.76
2016-08-20 05:00:00 UTC
Appluimonia
2
227.34
2.56
2016-08-02 04:00:00 UTC
Methylosmolene
2
227.26
2.56
2016-08-02 06:00:00 UTC
Chlorodinine
2
228.06
2.55
2016-08-01 09:00:00 UTC
AGOC-3A
2
228.02
2.55
2016-08-01 10:00:00 UTC
AGOC-3A
2
227.68
2.55
2016-08-01 19:00:00 UTC
AGOC-3A
2
155.20
0.80
2016-12-05 06:00:00 UTC
AGOC-3A
2
303.60
1.20
2016-04-04 13:00:00 UTC
Appluimonia
2
125.57
0.67
2016-04-14 07:00:00 UTC
Appluimonia
2
227.22
2.56
2016-08-02 07:00:00 UTC
Appluimonia
2
335.83
1.13
2016-12-06 19:00:00 UTC
Appluimonia
3
227.53
2.56
2016-08-01 23:00:00 UTC
Methylosmolene
3
Almost all readings for wind direction about 145 degrees
Chlorodinine
3
227.86
2.55
2016-08-01 14:00:00 UTC
AGOC-3A
3
227.60
2.55
2016-08-01 14:00:00 UTC
AGOC-3A
3
This sensor is unreliable and noisy for detecting this chemical
Appluimonia
4
This sensor is unreliable and noisy for detecting this chemical Appluimonia
5
158.60 0.60 2016-08-12 06:00:00 UTC AGOC-3A
5
158.60
0.60
2016-08-12 06:00:00 UTC
Chlorodinine
5
155.20
0.80
2016-12-05 06:00:00 UTC
Appluimonia
6
227.18
2.5
2016-08-02 08:00:00 UTC
Appluimonia
7
124.83
0.43
2016-04-19 01:00:00 UTC
Methylosmolene
7
239.87
0.47
2016-04-19 02:00:00 UTC
Methylosmolene
7
357.17
 0.77
2016-04-19 05:00:00 UTC
Methylosmolene
7
354.80
0.80
2016-04-14 15:00:00 UTC
AGOC-3A
7
358.30
0.90
2016-04-19 06:00:00 UTC
AGOC-3A
8
254.26
0.57
2016-04-29 05:00:00 UTC
Chlorodinine
8
59.00
0.80
2016-04-16 12:00:00 UTC
Appluimonia
9
158.9
0.6
2016-04-11 03:00:00 UTC
Methylosmolene
9
This sensor is unreliable and noisy for detecting this chemical Chlorodinine
9
273.97
0.97
2016-12-15 10:00:00 UTC
AGOC-3A


The list above shows that most of the anomalies were observed during mid to late April 2016, first week of August 2016, and early to mid December 2016.




MC2.2 – Now turn your attention to the chemicals themselves.  Which chemicals are being detected by the sensor group?  What patterns of chemical releases do you see, as being reported in the data?

Limit your response to no more than 6 images and 500 words.


The sensor reading vs. wind direction plots mentioned above show the detected chemicals as higher than baseline sensor readings.


Here are 6 such plots (in addition to the two shown in Figures 2 and 3 above.





Figure 4: Plots of sensor reading vs. wind direction and wind speed for sensor #1











Figure 5: Plots of sensor reading vs. wind direction and wind speed for sensor #3











Figure 6:  Plots of sensor reading vs. wind direction and wind speed for sensor #4









Figure 7: Plots of sensor reading vs. wind direction and wind speed for sensor #5







Figure 8: Plots of sensor reading vs. wind direction and wind speed for sensor #7






Figure 9: Plots of sensor reading vs. wind direction and wind speed for sensor #8




Chemicals detection by sensor

A Dirichlet process based unsupervised clustering algorithm was developed to automatically detect clusters of such significant sensor readings. Combination of the output of this clustering algorithm and visual analysis of the sensor reading vs. wind direction plots (Figure 4 through 9 above) show the following information about chemicals being detected by the deployed sensors:


Sensor 1 is detecting:  Chlorodinine, AGOC-3A, Appluimonia

Sensor 2 is detecting: Methylosmolene, Chlorodinine, AGOC-3A

Sensor 3 is detecting: Methylosmolene, AGOC-3A

Sensor 4 is detecting: Methylosmolene, Chlorodinine, AGOC-3A

Sensor 5 is detecting: Methylosmolene, Chlorodinine, AGOC-3A, and Appluimonia

Sensor 6 is detecting: Methylosmolene, Chlorodinine, AGOC-3A, and Appluimonia

Sensor 7 is detecting: Methylosmolene, Chlorodinine, AGOC-3A

Sensor 8 is detecting: Methylosmolene, Chlorodinine, AGOC-3A

Sensor 9 is detecting: Methylosmolene, AGOC-3A, and Appluimonia



 

MC2.3Which factories are responsible for which chemical releases? Carefully describe how you determined this using all the data you have available. For the factories you identified, describe any observed patterns of operation revealed in the data.

Limit your response to no more than 8 images and 1000 words.


The chemicals released by the factories are obtained by correlating the directional angles of significant sensor readings for each sensor with the factory to sensor angles as listed in Table 1. An unsupervised clustering algorithm was developed to detect significant sensor readings above the baseline for each sensor-chemical pair. Singleton clusters are eliminated as outliers. The largest cluster (corresponding to baseline readings) was also rejected. The direction to sensor for a cluster was computed as the direction for a sensor measurement that is closest to the center of the cluster. This factory that is most closely aligned along this direction from the sensor is then assigned the chemical corresponding to the cluster. The results obtained by this algorithm are shown in Table 3.



Table 3:  List of chemicals from factories and the sensors that detect them

 

Factory
Chemical
Sensor Used for
determination
Indigo Methylosmolene Sensor 8
Indigo
Chlorodinine
Sensor 2
Indigo AGOC-3A Sensor 9
Indigo Appluimonia Sensor 5, Sensor 6, Sensor 9
Kasios Methylosmolene Sensor 3, Sensor 4, Sensor 5, Sensor 6, Sensor 7, Sensor 8, Sensor 9
Kasios
Chlorodinine
Sensor 1, Sensor 4, Sensor 5, Sensor 6, Sensor 7, Sensor 8
Kasios
AGOC-3A
Sensor 1, Sensor 3, Sensor 4, Sensor 5, Sensor 6, Sensor 7, Sensor 8, Sensor 9
Radiance Methylosmolene Sensor 7
Radiance Chlorodinine Sensor 7
Radiance
AGOC-3A Sensor 6
Roadrunner Methylosmolene Sensor 2, Sensor 3, Sensor 4, Sensor 5
Roadrunner
Chlorodinine
Sensor 1, Sensor 4, Sensor 5, Sensor 8
Roadrunner Appluimonia Sensor 1, Sensor 5
Roadrunner AGOC-3A
Sensor 2, Sensor 3, Sensor 4, Sensor 5, Sensor 6, Sensor 8, Sensor 9

 


Inference on chemical emission by factories

Based on this table, we can make the following inference about the chemicals given out by the factories.


Since the emissions of
Methylosmolene, Chlorodinine, and AGOC-3A from the factory Indigo are detected by only one sensor each, and these are some of the farthest sensors from the factory, these measurements are most likely erroneous. So, most likely, Indigo only emits Appluimonia since this is detected by three separate sensors.


It is highly likely that the factory Kasios emits Methylosmolene, Chlorodinine, and AGOC-3A since these are being detected by a large number of sensors .


The factory Radiance does not emit Appluimonia. Since only sensor #7 detects Methylosmolene and Chlorodinine, and sensor #6 which is much closer to Radiance does not detect these, these detections by sensor #7 are most like erroneous. Since AGOC-3A from Radiance is only detected by the closest sensor #6 and not by any other sensor, the level of emission of AGOC-3A from this factory is low.


The factory Roadrunner seems to be emitting all four chemicals since each of these are being detected by multiple sensors.


Factory patterns of operation
To characterize the patterns of operation of the factories, the sensor readings for the factories bt the sensors that detect them (as shown in Table 3) were computed over time of day irrespective of the date. Figure 10 through 17 show such plots for the different factory-chemical
combinations:




Figure 10: Appluimonia readings for factory Indigo


Figure 10 shows that the factory Indigo is mostly active during the hours os 5 AM and 10 PM as the sensor readings outside those hours are pretty low.




Figure 11:  Methylosmolene readings for factory Kasios




Figure 12:   Chlorodinine readings for factory Kasios



Figure 13:   AGOC-3A readings for factory Kasios


Figure 11 through 13 show that the factory Kasios is operational 24 hours a day but it conducts different activities during different times of the day. For example, the process that produces Methylosmolene is carried out between 6 PM and 2 AM; the process that produces Chlorodinine is active 24 hoursl and the process that emits AGOC-3A is operational between 3 AM and 5 PM.




Figure 14:   Methylosmolene readings for factory Roadrunner



Figure 15:  Chlorodinine readings for factory Roadrunner




Figure 16:   AGOC-3A readings for factory Roadrunner


Figure 17:   Appluimonia readings for factory Roadrunner


Figure 14 through 17 show the pattern for the factory Roadrunner. Here, the process that produces Methylosmolene is active between 6 PM and 1 AM; the process that produces Chlorodinine is active between 2 AM and 9 AM, and again between 5 PM and 10 PM; the process that emits AGOC-3A is operational between 2 AM and 5 PM; and the process that emits Appluimonia operational pretty much the whole day.